Speech Enhancement with Applications in Speech Recognition
نویسنده
چکیده
The objective of this research is to develop feature compensation techniques to make automatic speech recognition (ASR) systems more robust to noise distortions. The research is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment. In this report, we aim to build a generic framework for feature compensation to improve speech recognition accuracy by making speech features less affected by noises. The degradation of ASR systems under noisy conditions is due to the mismatch between the clean-trained acoustical models and noisy testing speech features presented to the speech recognition engine. Currently, two general approaches are proposed to reduce this mismatch. The first is to adapt the acoustical model to the noisy testing feature, the other is to compensate the noisy testing feature prior to the recognition. We review existing techniques for noise robust speech recognition and find that these techniques generally ignore inter-frame information of the speech signal. We however believe that inter-frame statistics can contribute to noisy speech features compensation and hence propose a vector autoregressive (VAR) model to model speech feature vectors for speech feature reconstruction by either past or future frames prediction. We propose two feature compensation schemes based on the VAR model and the missing feature theory (MFT). Experiments are carried out using the ground-truth data mask on the AURORA-2 database, and our results show significant improvement to recognition accuracy. Specifically, our experimental results showed a relative error rate reductions of 86.51% and 93.9% with respect to the baseline for the subway noise case of test set A and restaurant noise case of test set B at signal to noise ratio equals to -5dB. The proposed VAR modeling framework is a promising research direction and we will conduct further research to exploit the full potential of this technique.
منابع مشابه
A Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملA Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement
A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006